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Reference:  Government  Contract  No.  N00014-09-C-0050,  “Enhancing  Simulation-based 


Training  Adversary  Tactics  via  Evolution  (ESTATE)” 
Charles  River  Analytics  Contract  No.  C08098 


Subject:  Contractor’s  Status  Report:  Quarterly  Status  Report  #6 

Reporting  Dates:  3/15/2009  -  6/15/2010 


Dear  Dr.  Hawkins, 

The  following  is  the  Contractor’s  Quarterly  Status  Report  for  the  subject  contract  for  the 
indicated  period.  During  this  reporting  period  we  have  concentrated  on  Task  4:  Develop  Trainee 
Model  Processing  and  Task  6:  Sim-based  Training  System  Integration. 

1 .  Summary  of  Progress 

During  this  reporting  period,  we  have  leveraged  prior  analysis  of  the  MoneyBee  dataset  with  our 
academic  partner  to  further  analyze  the  student  learning  of  the  task.  We  have  also  begun  the 
design  of  a  simulated  training  context  and  ESTATE  architecture  implementation  to  address  this 
context. 

1.1  Analysis  of  Learning  in  the  MoneyBee  Dataset 

The  goal  of  this  task  is  to  discover  a  method  to  measure  student  learning  and  to  determine  if 
students  are  gaining  proficiency  in  this  pre-algebra  activity.  This  method  will  augment  our 
student  assessment  and  challenge  adaptation  techniques  by  providing  a  better  estimate  of  student 
ability  and  Zone  of  Proximal  Development  (ZPD).  Earlier  exploration  of  the  MoneyBee  Dataset 
indicated  that  the  students  score  better  as  they  attempt  more  problems,  but  because  of  student 
selection  of  problems,  it  was  unclear  whether  the  students  were  improving  or  simply  choosing 
easier  problems  to  attempt  (Rosenberg,  2009).  Also,  we  discovered  that  our  heuristic  estimate  of 
problem  difficulty  correlates  with  the  time  to  complete  a  problem  (Rosenberg,  2010).  The  results 
of  the  current  analysis  below  show  that  as  the  number  of  problems  attempted  by  a  student 
increases,  1)  the  mean  and  median  difficulty  increases  and  2)  the  mean  and  median  time  to 
complete  decreases.  This  provides  strong  evidence  for  learning  on  the  MoneyBee  task. 

MoneyBee  is  a  coin  algebra  activity.  The  student  is  given  a  sum  and  a  number  of  coins  and  has 
to  pick  out  which  coins  add  up  to  the  sum.  A  session  consists  of  paired  exercises  until  a  student 
completes  five  problems.  In  each  exercise,  students  create  problems  for  the  other  to  solve, 
followed  by  the  reception  of  a  student-created  problem  and  a  graphical  workbench  for  solving 
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the  problem.  The  record  of  each  exercise  collects  a  detailed  timeline,  down  to  a  tenth  of  a 
second,  recording  when  players  add  and  subtract  coins  towards  solving  the  problem  they  are 
presented.  When  a  student  solves  a  problem,  both  the  student  and  his  or  her  partner  receive 
points  equal  to  the  estimated  problem  difficulty.  Thus,  students  are  incentivized  to  choose  the 
most  difficult  problems  they  believe  their  partner  can  solve. 

Our  difficulty  heuristic  performs  the  following  calculation  to  estimate  difficulty.  Beginning  with 
the  initial  amount  of  cents: 

1 .  Remove  the  odd  pennies  (modulo  five) 

2.  Search  for  the  solution  adding  a  single  coin  in  a  breadth  first  search  (first  quarters,  then, 
dimes,  then  nickels,  then  pennies),  until  the  problem  has  only  one  coin  type  remaining. 

This  heuristic  makes  the  assumption  that  players  will  attempt  larger  valued  coins  first,  and  that 
players  mentally  search  for  a  solution  by  considering  all  alternatives  in  sequence.  Because 
breadth  first  search  is  exponential  in  the  number  of  nodes  explored,  we  take  the  logarithm  of  the 
heuristic  as  the  estimate. 

Figure  1-1  shows  a  graph  of  the  estimated  problem  difficulty  per  session.  As  students  play  more 
sessions  they  are  given  problems  with  higher  estimated  difficulty.  Thus,  as  students  play  more 
sessions  their  partners  estimate  that  they  will  be  able  to  solve  more  difficult  problems.  Figure  1-2 
and  Figure  1-3  show  the  relation  between  number  of  sessions  played  and  mean  and  median  time 
to  completion.  As  students  play  more  sessions  their  time  to  complete  each  game  decreases, 
indicating  that  they  are  able  to  solve  these  problems  with  more  proficiency.  Together,  these 
analyses  indicate  that  students  are  learning  through  challenges,  solving  more  difficult  problems 
in  less  time  as  they  gain  experience. 


Figure  1-1:  Estimated  problem  difficulty  per  session. 
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Time  Time 


Median  time  per  session 


Figure  1-2:  Median  average  game  time  per  session 


Average  time  per  session 


Figure  1-3:  Mean  average  game  time  per  session 
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Our  next  steps  with  the  MoneyBee  dataset  will  be  to  improve  our  visualizations  of  the  strategy 
choices,  developing  a  strategy  “heat  map”  to  provide  an  observable  visual  overview  of  how  the 
students  move  between  states  in  the  problem  space.  For  instance,  choices  that  are  mode  more 
often  may  be  drawn  with  thicker  arrows,  making  the  most  common  paths  more  apparent. 
Comparing  these  visualizations  between  inexperienced  and  experienced  players  may  provide 
information  as  to  how  strategies  evolve  due  to  experience.  We  can  then  use  this  analysis  to  create 
models  of  different  players  for  future  experiments. 

1.2  Development  of  Simulated  Training  Context  and  Corresponding  ESTATE 
Architecture  Implementation 

Previously,  we  have  demonstrated  the  use  of  the  MaxSolve  monotonic  solution  concept  (De 
Jong,  2005)  for  coevolution.  Ficici  (2004)  identifies  solution  concepts  as  a  method  to  analyze  the 
relationship  between  the  selection  of  individuals  in  coevolution  and  the  meeting  of  the  overall 
goals  of  the  coevolutionary  process.  It  indicates  which  individuals  to  keep  for  future  populations; 
thus,  a  solution  concept  is  a  type  of  memory  mechanism.  A  well  functioning  solution  concept 
will  drive  the  population  towards  the  goals  (e.g.  being  a  better  game  player),  while  a  poorly 
functioning  solution  concept  will  cause  the  population  to  flounder  due  to  one  or  more 
coevolutionary  pathologies. 

Our  criteria  for  selecting  a  solution  concept  was  that  1)  the  solution  concept  performed  well  in 
practice  and  2)  the  solution  concept  did  not  further  constrain  on  the  problem.  Performance 
comparisons  between  these  algorithms  (De  Jong,  2005;  De  Jong  &  Bucci,  2006), 
communications  with  authors  (Bucci,  2010),  and  consultation  with  our  academic  partner,  an 
expert  in  this  area,  led  us  to  choose  the  MaxSolve  solution  concept  as  the  best  candidate  for 
implementation  and  testing.  MaxSolve  has  exhibited  high  performance  on  a  number  of  different 
challenges,  and  it  does  not  place  any  additional  constraints  on  our  problem  space.  We  previously 
implemented  MaxSolve  and  tested  the  technique  on  the  COMPARE-ON-ONE,  Challenge  tree, 
and  Nim  games,  showing  that  MaxSolve  performed  well  in  these  domains  (Rosenberg,  2010). 

Our  next  step  is  to  design  and  implement  a  simulated  training  context  to  test  the  performance  of 
the  ESTATE  approach  with  MaxSolve  coevolution.  As  an  initial  implementation,  the  challenge 
tree  approach,  shown  in  Figure  1-4,  offers  a  number  of  advantages.  First,  the  challenge  structure 
is  simple,  and  will  ease  the  diagnosis  and  debugging  of  implementation  issues.  Second,  our 
coevolutionary  technique  has  been  tested  on  this  structure  and  it  performs  well.  Third,  this  type 
of  challenge  can  be  readily  adapted  to  a  number  of  challenge  domains. 

We  plan  to  first  implement  a  maze  challenge:  trainees  are  dropped  inside  a  room  in  a  virtual 
maze  and  traverse  the  challenge  tree  by  selecting  doors  to  walk  through,  without  backtracking. 
Each  room  is  decorated  with  clues  that  indicate  to  the  trainee  which  door  to  choose  to  stay  on  a 
path  to  an  exit.  For  instance,  a  house  plant  and  a  picture  of  a  sailboat  could  indicate  choosing  the 
leftmost  door.  By  repeatedly  attempting  the  challenges,  trainees  are  taught  how  the  clues 
combine  to  indicate  a  door  choice.  Following  this  initial  implementation,  we  plan  to  implement  a 
cultural  training  application.  Here,  the  trainee  is  presented  with  a  conversational  goal  and  a 
current  conversational  state  in  a  dialog  tree.  Based  on  the  current  state,  the  trainee  must 
repeatedly  choose  actions  or  lines  of  dialog  until  the  interaction  is  completed,  either  with  success 
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or  failure.  We  map  the  nodes  of  the  ehallenge  tree  to  eonversation  states,  and  the  edges  of  the 
tree  to  trainee  dialog  or  action  choices.  By  repeatedly  attempting  new  challenges,  trainee  learns 
how  indicators  about  the  current  conversational  state  can  be  used  to  choose  actions. 


(4  levels) 


Figure  1-4:  An  example  challenge  tree  game.  The  trainee  begins  at  a  node  and  chooses 
edges  to  move  down  the  tree  until  a  leaf  node  is  reached.  The  leaf  nodes  are  either  marked 

as  successes  (circled)  or  failures  (not  circled). 

To  implement  this  simulated  training  context  and  gather  performance  data,  we  must  provide 
initial,  prototype  implementations  of  the  entire  ESTATE  architecture,  shown  in  Figure  1-5.  Our 
implementation  of  the  Training  System  must  include  a  Simulated  Environment.  For  the  maze 
challenges  this  environment  will  be  a  definition  of  the  maze  structure  and  of  the  protocol  for 
decorating  rooms,  and,  for  the  cultural  trainer,  this  environment  will  be  a  definition  of  the 
conversation  tree  and  the  method  for  creating  dialog  and  action  options.  The  Training  System 
will  also  simulate  Trainee  Models,  which  may  include  trainees  that  exhibit  a  number  of  learning 
bugs  (e.g.,  failure  to  recognize  the  decoration  mechanism,  ignoring  one  or  more  features,  slow 
learning,  or  general  forgetfulness).  The  Trainee  Model  Extractor  will  use  a  diagnosis  routine  to 
generate  a  number  of  trainee  models.  This  routine  will  use  the  known  trainee  moves  to  sample 
from  the  possible  strategy  space  of  the  trainee,  producing  a  number  of  simulated  individuals  as 
the  initial  population  of  coevolution.  These  individuals  will  be  sent  to  the  problem  generator  to 
run  the  MaxSolve  coevolution  with  these  individuals  and  an  archive  of  challenges  as  the 
Adaptation  technique.  The  problem  generator  will  use  an  estimate  of  the  ZPD  to  stop  the 
coevolution  at  a  specified  point  and  send  the  next  top  challenge  to  the  trainee  to  repeat  the  cycle. 
The  estimation  of  the  ZPD  may  be  the  number  of  generations  in  coevolution,  the  number  of  new 
tests  discovered,  or  some  distance  calculation  between  individuals  or  tests  in  the  population. 
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On-line  Execution 


Figure  1-5:  ESTATE  Challenge-Response  System  Architecture 

Our  next  steps  are  to  complete  the  design  of  the  simulated  training  context  and  begin 
implementation  of  the  maze  and  cultural  training  applications.  We  aim  to  have  both  a  set  of 
simulated  trainee  models  and  a  simple  user  interface  for  human  users  to  test  the  system. 
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3.  Scheduled  Items 

In  the  next  reporting  period  we  plan  to  address  the  following  items: 

•  Further  investigate  trainee  strategy  modeling. 
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•  Design  and  begin  implementation  of  a  simulation  of  1)  trainees  attempting  challenges,  2) 
assess  trainee  skill  and  strategy,  and  3)  challenges  evolving. 

•  Continue  MoneyBee  strategy  data  analysis  and  visualization. 

Sincerely, 


Brad  Rosenberg 
Principal  Investigator 
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